Search CORE

9 research outputs found

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

Author: Kwon Dohyun
Kwon Jeongyeol
Nowak Robert
Wright Steve
Publication venue
Publication date: 04/09/2023
Field of study

In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter

\sigma > 0

. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be

O(\sigma)

-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as

O(\sigma)

-approximation of the original BO, we propose first-order algorithms that find an

\epsilon

-stationary solution by optimizing the penalty formulation with

\sigma = O(\epsilon)

. When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an

\epsilon

-stationary point of the penalty function, using in total

O(\epsilon^{-3})

and

O(\epsilon^{-7})

accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, i.e., with

O(1)

samples per iteration, and achieves the improved oracle-complexity of

O(\epsilon^{-3})

and

O(\epsilon^{-5})

, respectively

arXiv.org e-Print Archive

Tractable Optimality in Episodic Latent MABs

Author: Caramanis Constantine
Efroni Yonathan
Kwon Jeongyeol
Mannor Shie
Publication venue
Publication date: 05/10/2022
Field of study

We consider a multi-armed bandit problem with

M

latent contexts, where an agent interacts with the environment for an episode of

H

time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent context. The resulting partial observation of the environment makes the learning task significantly more challenging. Without any additional structural assumptions, existing techniques to tackle partially observed settings imply the decision maker can learn a near-optimal policy with

O(A)^H

episodes, but do not promise more. In this work, we show that learning with {\em polynomial} samples in

A

is possible. We achieve this by using techniques from experiment design. Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with

O(\texttt{poly}(A) + \texttt{poly}(M,H)^{\min(M,H)})

interactions. In practice, we show that we can formulate the moment-matching via maximum likelihood estimation. In our experiments, this significantly outperforms the worst-case guarantees, as well as existing practical methods.Comment: NeurIPS 202

arXiv.org e-Print Archive

Recommended from our members

Statistical learning with latent variables : mixture models and reinforcement learning

Author: Kwon Jeongyeol
Publication venue
Publication date: 29/11/2022
Field of study

Statistical learning with missing or hidden information is ubiquitous in many practical problems. For example, the success of a certain medical treatment can largely depend on the unknown conditions of patients, or some parts of data could be censored to protect the privacy of individuals. In contrast to problems with full information which often have simple and tractable solutions, the existence of such latent variables often plants the intractable non-convexity complicating the landscape of the problem from both statistical and computational aspects. While both aspects are important, this thesis put more emphasis on understanding the statistical challenges raised by latent variables. This thesis consists of two main parts. In Part I, we consider the parameter estimation in finite mixture models, and in particular, we study the Expectation-Maximization (EM) algorithm for learning the maximum likelihood estimator (MLE) given i.i.d. samples from mixtures of Gaussian distributions. In Part II, we turn our focus to reinforcement learning (RL), and study the sample-complexity of learning a near-optimal policy when an important context of an environment is not observable. We now give a brief overview of these two parts. Part I. The first part of the thesis studies the convergence and statistical behaviors of EM in largely two fundamental settings: one setting concerns a mixture consisting of two symmetric components, and the other setting concerns a mixture of an arbitrary number of well-separated components. Chapter 1 describes some background and notation that would be relevant to the two settings. In Chapter 2,we focus on a mixture of two linear regressions (2-MLR), and completely characterize the global optimality of EM: we show that starting from any randomly initialized point, the EM algorithm converges to the true parameter at the known minimax statistical rates in all parameters under all signal-to-ratio (SNR) regimes. In Chapter 3, we focus on two canonical mixture problems: a mixture of K ≥ 3 well-separated Gaussians (K-GMM) and linear regressions (K-MLR). For these problems, we provide a rigorous (local) convergence guarantee for the EM algorithm when the mixture components are well-separated. Notably, we establish the minimax statistical rate of EM (and thus MLE) in all problem parameters for these two examples. Part II. In the second part of the thesis, we consider learning a near-optimal policy in Latent Markov Decision Processes (LMDPs). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. As a starting point, we show that a general instance of LMDPs with S states and A actions requires at least Ω(SA)[superscript M] episodes to even approximate the optimal policy. This lower bound suggests that the problem is only tractable when additional assumptions are provided with growing number of contexts M = ω(1), or when the number of contexts are small M = O(1). We describe a more detailed overview and backgrounds in Chapter 4. In Chapter 5, we first prove the lower bound of Ω(SA)[superscript M] in the absence of further assumptions. Then we focus on the M = ω(1) regime, where we consider sufficient assumptions under which learning good policies requires polynomial number of episodes in M. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, i.e., providing a sublinear regret guarantee when we are given a good initialization. The need for initialization can be removed if a certain statistical sufficiency assumption is provided. In Chapter 6, we consider learning a near-optimal policy in reward-mixing MDPs (RMMDPs), which itself is a special case of LMDPs with common state transitions probabilities across contexts. Without any assumptions, no sample upper bound is known even for this setting. We first study the problem of learning a near optimal policy for two reward-mixing MDPs, i.e., M = 2. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality. Indeed, with no further assumptions, even for two switching reward-models, the problem requires several new ideas beyond existing algorithmic and analysis techniques for efficient exploration. Finally, in Chapter 7, we show that the moment-matching idea can be applied when the system is in a simpler latent multi-armed bandit (LMAB) setting (or equivalently when S = 1). Unlike in general LMDPs which suffers Ω(A)[superscript M] lower bound, we show that the polynomial sample complexity in A is possible in LMABs.Electrical and Computer Engineerin

Texas ScholarWorks

Clustering and Classification [Project Title from Cover]

Author: Caramanis Constantine
Kwon Jeongyeol
University of Texas at Austin. Data-Supported Transportation Operations & Planning Center (D-STOP)
Publication venue: University of Texas at Austin. Data-Supported Transportation Operations & Planning Center (D-STOP)
Publication date
Field of study

DTRT13-G-UTC58The Expectation-Maximization algorithm is perhaps the most broadly used algorithm for inference of latent variable problems. A theoretical understanding of its performance, however, largely remains lacking. Recent results established that EM enjoys global convergence for Gaussian Mixture Models. For Mixed Regression, however, only local convergence results have been established, and those only for the high SNR regime. We show here that EM converges for mixed linear regression with two components (it is known not to converge for three or more), and moreover that this convergence holds for random initialization

Rosa P: A digital library for transportation research